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ABSTRACT 


The main reason behind the spread of fake news is because of many fake and 
hyperpartisan sites present on the Internet. These fake sites try to manipulate the truth 
which creates misunderstanding in society. Therefore, it is important to detect fake news 
and try to make people aware of the truth. This paper gives an insight into how to detect 
fake news using Machine Learning and Deep Learning Techniques. On observing our 
data, we have categorized our data into five attributes namely Title, Text, Subject, Date, 
and Labels. In order to develop an efficient fake news detection system, the feature along 
with its degree of impact on the system must be taken into consideration. This paper 
attempts at providing a detailed analysis of detecting fake news using various models 
such as LSTM, ANN, Naive Bayes, SVM, Logistic Regression, XGBoost, and Bert. 
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1. INTRODUCTION 


There has been an upsurge in the number of bogus websites on the Internet in recent years. Fake 
news and a lack of faith in the media are creating problems in our society that have far-reaching 
consequences. While the volume of fake news on the internet appears to be increasing, some 
people may find it progressively troublesome to tell what is true and what is not. As the 2016 
Presidential election demonstrated, a lack of information literacy can have real-world 
consequences. It is important to establish skills to recognize and critically examine news you 
read on social media and other platforms because it can be quite compelling [1]. 
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Fake news campaigns are a type of modern information warfare that nations and other 
entities undertake to weaken their opponents’ authority and legitimacy. Individuals who check 
the accuracy of published news are known as fact-checkers. Those experts deconstruct fake 
news by pointing out its inaccuracies. Traditional fact-checking can be supplemented by 
machine learning and Natural Language Processing (NLP), according to research. 

Fake news identification is a difficult undertaking due to the size of the system and the 
intricacies of the forces that influence it. The parameters that influence the development of an 
effective false news detecting system are critical. When reading news on the Internet, people 
must evaluate a variety of elements such as the news article's source, the date and time, the 
author's name, and a variety of other aspects that influence fake news. These aspects that 
influence fake news are dynamic in nature, and the selection of prominent traits is crucial in 
forecasting an accurate result [2]. 

Artificial intelligence is a conventional term that alludes to procedures that empower PCs 
to impersonate human conduct. All of this is made conceivable through AI, which is a bunch 
of calculations that have been educated on data. 


ARTIFICIAL INTELLIGENCE 


> An algorithm that can recognize, react, and adjust 


MACHINE LEARNING 


> Supervised and Un-superivised learning 


DEEP LEARNING 


~ Aosifinza<! Alesse! Aletussmrl 
> Artificial Neural Network 





Figure 1 


Deep Learning, on the other hand, is a sort of Machine Learning that is inspired by the 
human brain's structure. Deep learning algorithms analyze data with a predetermined logical 
structure in order to reach similar conclusions as humans. Deep learning does this by employing 
a multi-layered structure of algorithms known as "Neural Networks”’. 


The neural network's plan is roused by the construction of the human cerebrum. Neural 
networks can be educated to play out the very errands on information that our minds do when 
distinguishing designs and arranging various kinds of data. 


The autonomous neural networks layers can likewise be considered as a sort of channel that 
works from extensive to unobtrusive and expands the likelihood of perceiving and producing a 
right outcome. The brain of human beings works likewise. At whatever point we get new data, 
the mind attempts to contrast it and known elements. Deep neural networks make use of the 
Same notion. 


Various assignments like clustering, classification, or regression can be accomplished with 
the assistance of an Artificial Neural Network. These networks permit us to gather or sort 
untagged information dependent on likenesses between tests in that information. What's more, 
on account of order, the networks can be trained on marked datasets to characterize the 
examples in that dataset into various classifications which can be either paired or multiclass. 
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As arule, neural networks can execute similar assignments as customary ML methods. But, 
vice versa 1s not true. 


Deep learning is responsible for all recent developments in artificial intelligence. Auto- 
driven cars, IM Bot, and virtual assistants like Google assistant and Siri would not exist without 
deep learning. Amazon Prime or Disney+Hotstar would have no notion about what movies or 
TV shows we like to watch or not, and the Google Translate app would remain as basic as it 
was a decade ago (before Google transitioned to neural networks for this app). Neural networks 
are at the heart of all of these innovations. 


We could even go so far as to suggest that deep learning and ANN are driving a new 
industrial revolution today. 

Deep learning is, by the day's end, awesome and most clear way to deal with genuine 
machine knowledge we have so far. 


Conventional machine learning methods were mainly used, before deep learning came into 
action. Examples of Machine Learning methods such as Decision Trees, SVM, Naive Bayes 
Classifier, Logistic Regression and XG Boost, will further be discussed and compared with 
deep learning models like LSTM, ANN and Bert [3]. 


Large Neural Network 
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Performance 





Data 


Figure 2 


Conventional machine learning techniques such as SVM and Naive Bayes classifiers limit 
their improvisation beyond a saturation point, however Deep Learning models improve with 
increasing amounts of training data [4]. 


2. BACKGROUND AND RELATED WORK 


Prior to giving the subtleties of the proposed Deep Learning and Machine Learning Algorithms, 
we present the problem statement and related works [5]. 


2.1 Problem Statement 


The use of online newspaper publishing platform for news consumption has two sides. From 
one perspective, consumers seek out and consume news via this platform because of its low 
cost, easy accessibility, and fast transmission of data. Then again, it facilitates the endless 
dispersion of “fake news" or inferior quality news that contains intentionally deceptive material. 
The pervasive spreading of fake news has the potential to have tremendously detrimental results 
for both individuals and society. As a result, detecting fake news has recently become an 
emergent study topic that is becoming a center of attention. 


First, fake news is purposefully written which allows readers to believe an information 
which is not true, which in turn makes it difficult to detect fake news on the basis of news 
content; as a result, we must include auxiliary information, such as user social engagements on 
the online news platform, to assist in making a decision. Second, utilizing this auxiliary data is 
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difficult in and of itself, as consumers’ social interactions with fake news generate large, 
incomplete, unstructured, and noisy data. 


2.2 Dataset 


The factors identified which influence the fake news share a varying deal of change .In order 
to tackle with fake news we have used a dataset available on Kaggle which consist of five 
attributes namely Title, Text, Subject, Date and Label .Title contains the news heading whereas 
the Text contains the details about the news. The label conveys whether the news 1s true or fake. 
Subject contains the category in which the news belongs to, for example - political, leftist or 
rightist. Date contains the Date of article published. A fake news detection system is designed 
considering these factors. 


C title text subject date fake 
0 As US buoget fort looms, Repaicass fip WASHINGTON (Reuters). The head of aconservat. pollicsNews 9 Decemter 51 2017 0 
1 US. military to accept Transgender recrutts o WASHINGTON (Routers). Trasagencer peapie wil poficeNees Decerrter 29 2017 y 
2 Serex U.S Aeputlican senator: ‘Let Ur Met WASHINGTON (Reuters). The speces course! inv. pailiceews «=Decerrter 3*_ 2017 9 
3 FR Rumea pote Meet by Aust alan Giosomar WAS4NGTON (Reuters) - Tremp carngumgn acter peiecaews Decerter 3 2017 5 


4 Tranp wants Postal Sevice to arge Tach mo. SEATTLE WASMNGTON (Reuters). Present Coral policaNews Decerter 29, 2017 a") 


21412 Fully comnified NATO Cacts new US approach BRUSSELS ‘Reters) - NATO affes on Tuesday we ares August 22. 217 ") 
29413) «= Lesesewts wires feo products from Chinese LONDON (Mewiers;- Lewes 2 prowder of eotoess August 22, 20'7 ‘) 
21414 «Minsk olvral ud Decomes Neven tor aufories MINSA |Fe.ters) - in Te stadow of Gsused Sow ariress August 22. 2017 0 


Figure 3 Snapshot of our Dataset 
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Figure 4 Distribution of data 
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2.3 Methodology Used 
2.3.1 Natural Language Processing (NLP) 


Text mining 1s utilized to recognize realities, connections, and cases that would somehow stay 
hid in a huge volume of literary large information. When extricated, this data is converted into 
an organized structure that can be examined further to get significant experiences. Text mining 
utilizes a variety of procedures to deal with text, perhaps the most significant and generally 
utilized technique is Natural Language Processing. 


Natural language Processing (NLP) is the capacity of program to decipher communicated 
in and composed human language, often known as natural language. It's an artificial intelligence 
component. Natural language processing (NLP) allows PCs to understand natural language in 
the same manner that humans do. Artificial intelligence is used in natural language processing 
to absorb real-time data, elucidate it, and make sense of it in a way that a computer can 
understand, regardless of the language being spoken or written. Computers have programs to 
read and microphones to gather audio, much as people have different sensors such as ears to 
hear and eyes to see. The input is changed to code that the computer can interpret during the 
processing [6]. 

There are two primary stages to NLP: preprocessing of data and algorithm development. 


Data Preprocessing includes preparing and "cleaning" text information for machines to 
have the option to investigate it. preprocessing places information in functional structure and 
features highlights in the text that can be worked with an algorithm. There are multiple ways 
this should be possible, including: 


e Tokenization - This is when text is separated into more modest units to work with. Stop 
word expulsion - when normal words are eliminated from text so special words that 
offer the most data about the text remain. 


e lLemmatization - Its objective is to reduce the word to its base form and then grouping 

different forms of the word together. 

e Stemming- This partitions words with expression in them to root structures. 

An algorithm is created to deal with data when the data has been processed. There are a 
wide range of natural language processing algorithms, however two principle types are 
ordinarily utilized: 

Machine Learning-Based System. ML algorithms involves defining phenomena in terms 
of numbers and then using the numbers to either imply or deduce cause and effect. They figure 
out how to manage responsibilities dependent on training data given to them, and as more data 
processes, they modify their methodologies. Natural language processing algorithms refine 
their own standards through continued learning and processing , utilizing a blend of ML, deep 
learning, and neural networks. 


Deep Learning Based System 


[> onalid Trump Sends ut Embarrassing New Year’ Eve Message; This ; isturbingDonald rum st coul 


trump send embarrass new year eve messag disturbingdonald trump wish american happi new year 
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Figure 5 


2.3.2 Naive Bayes Classifier 


It is a classification method based on Bayes’ theorem and the assumption of independent 
predictor variables. In simple terms, the Naive Bayes classifier assumes that the existence of a 
feature in a class has nothing to do with the existence of another feature. 


For example, if a vegetable 1s wine-red, round and about 3 inches in diameter, it is called a 
potato. Even if these traits are interdependent or dependent on the existence of other traits, they 
all increase the possibility that this fruit is an apple, which is why it is called "Naive". 


The Naive Bayes model is easy to carry out and is uncommonly successful for huge datasets. 
Naive Bayes is prestigious to beat even the most developed order frameworks because of its 
effortlessness [7]. The Bayes theorem helps in calculating posterior probability (the revised or 
updated probability of an event occurring after taking into consideration new information) 
P(c|x) from P(c), P(x), and P(x|c) with P(c), P(x), and P(x|c). Check out the following equation: 

P(c|x) = [P(x|c) . P(c)] / P(x) 

Where, 

P(c|x)= Posterior Probability 

P(x|c)= Likelihood 

P(c)= Class Prior Probability P(x) = Predictor Prior Probability 


2.3.2.1 Gaussian Naive Bayes 


Gaussian Distribution conveys ceaseless qualities related with each component. A Gaussian 
distribution is likewise called Normal distribution. 


2.3.2.2 Multinomial Naive Bayes 


The recurrence with which explicit events were made by a multinomial distribution is addressed 
by feature vectors. This is the most well-known event model for record grouping. 


2.3.3 Logistic Regression 
Logistic Regression comes under the Supervised Learning approach which is the most widely 
used Machine Learning algorithm. It 1s a model for predicting a categorical dependent variable 
out of a number of independent variables. 

As a result, the output must be a discrete or categorical value. It can be any binary value 
like Yes or No, O or 1, true or false, and so on. Rather than giving exact values like O and |, it 
gives probabilistic values which are and far between O and 1[8]. 
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Figure 6 


Apart from how they are implemented, Logistic Regression is really similar to Linear 
Regression. For regression problems, Linear Regression is employed, while for classification 
difficulties, Logistic Regression is used. 


2.3.4 Support Vector Machine 


The "Support Vector Machine" is a type of supervised Machine Learning algorithm which is 
used in dealing with classification and regression problems. In most cases, it 1s used to classify 
output labels. Each input data value is a point in an n-dimensional space vector(where n is the 
quantity of distinct features we have inside the dataset), with the value of each feature equaling 
the value of a single coordinate. After that, we perform classification by identifying the 
hyperplane that effectively separates the two groups [9]. 


Simply put, support vector is the coordinate of each observation. The SVM classifier is the 
most effective boundary separating the two classes (hyper-plane/line). 


Support Vectors 5 w 





Figure 7 


2.3.5 XGBoost 


XGBoost is a decision tree-based troupe Machine Learning algorithm that uses a gradient 
boosting framework. In estimate issues including unstructured data (text, pictures etc.) neural 
networks will beat some other AI based calculation. By the by, when we are working with little 
to-medium organized or even information, tree-based algorithms are productive and vigorous 
[10]. 
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2.3.6 Long Short-Term Memory (LSTM) 


RNNs are really a kind of neural network that permits past step outputs to be employed as inputs 
to the current step. RNN is very well fitted for sequential prediction applications because of this 
characteristic. However, due to the vanishing gradient problem which is caused by gradient 
propagation in the recurrent network, these networks show weakness in learning long-term 
dependency on massive inputs. Long Short-Term Memory networks or "LSTM" are a form of 
RNN that could learn semi-permanent dependencies. They perform incredibly well on a broad 
variety of conditions and are currently in widespread use. LSTMs are specifically developed to 
prevent the problem of semi- permanent reliance. It is not one of their struggles to be told that 
they memorize information for long periods of time. All repeated neural networks take the form 
of a sequence of neural network continuation modules [11]. 





Layer Lomponemtiwise Cepy Concatenate 


Legend: — «© i e 


Figure 8 


Each conventional neuron in the concealed layer of a regular RNN is replaced by an LSTM, 
which is a memory unit. Figure 8 shows the structure of an LSTM memory cell. The flow of 
information to and fro from the cell is controlled by an input gate, a forget gate, and an output 
gate in the LSTM unit. 


2.3.7 Artificial Neural Network (ANN) 


An ANN is computing model which could be used to complete tasks like prediction, 
classification, and decision- making. It is made up of synthetic neurons. These synthetic neurons 
are exact replicas of human brain neurons. The signals to conduct the activities are sent by 
neurons in the brain. Artificial neurons link in a neural network to complete tasks 1n a similar 
way. Weight refers to the strength of the connection between the artificial neurons [12]. 


e Since several connections between input and output are non- linear, ANN can 
understand and model non-linear and complex connections. 


e After training, ANN may infer previously unknown relationships from previously 
unknown data, making it generic. 


e Unlike many other machine learning models, ANN has no requirements for datasets, 
such as Gaussian distribution or any other distribution. 
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Figure 9 


2.3.8 Bert 


BERT, or Bidirectional Encoder Representations from Transformers, enhances normal 
Transformers by eliminating the unidirectionality prerequisite by pre-preparing with a masked 
language model (MLM). This language model veils a few tokens from the contribution at 
irregular, fully intent on anticipating the covered word's unique jargon is dependent on its 
specific situation. The MLM point, dissimilar to left-to-right language model pre-training, 
permits the portrayal to coordinate the left and right settings, permitting us to pre-train a deep 
bidirectional Transformer. BERT likewise utilizes a next sentence forecast task, which pre- 
trains text-pair portrayals couple with the masked language model [13]. 


Pre-training and tuning are the two cycles in BERT. The model is prepared on unlabelled 
information across numerous pre-training undertakings during pre-training. The BERT model 
is calibrated by first introducing it with the pre-trained boundaries and afterward adjusting every 
one of the boundaries utilizing marked information from the downstream positions. Despite the 
fact that the models are beginning with similar pre-trained boundaries, each downstream 
assignment has its own calibrated model. 
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3. PROPOSED MODELS AND RESULTS 


In this section, we will look at the results, that is, Fl score, Accuracy, Recall score and Area 
under the ROC curve, that is, ROC AUC. Also, the Confusion Matrix for every model are 
presented in their respective columns. 
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TF-IDF: “Term Frequency - Inverse Document Frequency”. This is a procedure to measure 
a word in archives, we for the most part figure a load to each word which means the significance 
of the word in the report and corpus. 


This technique is an extensively used strategy in Data Retrieval and Text Mining. 
TF - The measure of times a term shows up in a textual report. 

IDF - A proportion of whether a term is uncommon or normal in the text 

TF-IDF = Multiplication (TF[n, d], IDF[n]) where, n — count of corpus d - document 


3.1 Naive Bayes Classifier 
3.1.1 Gaussian Naive Bayes 


[> **********Naive Bayes (TfidfVvectorizer)********** 
Fl score: @.9795106/769257371 
Accuracy: 0,96208 
ROC AUC: 6,9526970974768894 


Figure 11 


Figure 11 shows the results of Gaussian Naive Bayes Classifier. 
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Figure 12 Confusion Matrix for Gaussian Naive Bayes 


3.1.2 Multinomial Naive Bayes 


[> **********Multinomial Naive Bayes (TfidfVectorizer)*********? 
Fl score: 0,9890982844587172 
Accuracy: 8.97936 
ROC AUC: @.9526970974768894 
Figure 13 


Figure 13 shows the results of Multinomial Naive Bayes Classifier 
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Figure 14 Confusion Matrix of Multinomial Naive Bayes 
3.2 Logistic Regression 
*SRRSE**S Ooi Stic Regression (TfidfVectorizer)********** 
Fl score: @.9944629850746268 


Accuracy: 0,98944 
ROC AUC: @.9526970974768894 


Figure 15 


Figure 15 shows the results of Logistic Regression 
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Figure 16 Confusion Matrix for Logistic Regression. 


3.3 Support Vector Machine (SVM) 


KERESRESHECYM (ThidfVectorizer )********** 
Fl score: @.98/75321249255382 

Accuracy: 0.98456 

ROC AUC: @.9898000974022053 


Figure 17 
Figure 17 shows the results of Support Vector Machine 
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Figure 18 Confusion Matrix for SVM 


3.4 XGBoost 


PSRRSERESS KOON fiFiarvectorizer) 
Fl score: @.9888066825775657 

Accuracy: @.98776 

ROC AUC: @.9891163142433062 


Figure 19 
Figure 19 shows the results of XGBoost 
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Figure 20 Confusion Matrix for XGBoost 


3.5 Long Short Term Memory (LSTM) 


[32] model.summary() 


Model: "model LSTM" 











Layer (type) Output Shape Param # 
‘nput.1 (iapotteyer) stems) 1) . 
text vectorization (TextVect (None, 418) = = $0 © 
embedding (Embedding) (None, 418, 128) . 1280000 
letm (LSTM) (None, 64) 49408 
dense (Dense) (None, 1) 65 


Total params: 1,329,473 
Trainable params: 1,329,473 
Non-trainable params: 0 
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Accuracy Score of LSTM model: 0.9902004454342984 
Recall Score of LSTM model: 0.9856068743286789 
Precsion Score of LSTM model: 0.9954436971143416 
£1 Score of LST model: 0.9905008635578585 


Figure 21 


Figure 21 shows the results of LSTM. 
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Figure 22 Confusion Matrix for LSTM 
3.6 Artificial Neural Network (ANN) 
Printing classification_report for Test Set 
precision recall fi-score support 
a 8.97 0.97 8.98 1406 
i 8.97 0.97 9.98 2794 
accuracy 0.98 4200 
macro ave 8.98 8.98 9.98 4200 
weighted ave 0.98 9.98 0.98 4200 
Figure 23 
Figure 23 shows the results of Artificial Neural Network 
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Figure 24 Training Loss vs Validation Loss 
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Figure 25 Confusion Matrix for ANN 


3.7 Bert 
news bert report(pred lis.detach().numpy(), label lis.detach().numpy()) 


» fl-score: 0.9950845222395396 
accuracy: 0.9946980473296263 


Figure 26 


Figure 26 shows the results of Bert 


4. RESULT AND OUTPUT 


Table 1 


5. CONCLUSION 


Owing to the growth of emerging technologies, the rate of using online news platforms has 
increased rapidly. Nowadays, people are relying on online news publishing platforms to get 
quick insights on what is trending, and this has led to a decrease in the use of news channels 
and newspapers. Due to the lack of stringent policies over the Internet, there has been a 
significant increase in fraud and bogus news. The spread of fake news may lead to widespread 
destruction. Most of the Internet users are sharing news articles with their friends and families 
without checking the credibility of the news and this has resulted in unnecessary disputes 
between different societies and religious communities. In the future, we can’t distinguish 
between real news and fake news as the rate of fake news has become disguise. In this digital 
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age, where hoax news is present everywhere on digital platforms, there is an ultimate need for 
fake news detection and this model serves its purpose by being the need of the hour tool. Fake 
News regarding sensitive topics leads to a toxic environment on the web. Fake News Detection 
is the systematic analysis of socially relevant data to distinguish whether it is real or fake 

[15]. In this paper, we have compared various Machine Learning methods like Naive Bayes, 
Logistic Regression, Support Vector Machine, XGBoost and Deep Learning Methods like 
ANN, LSTM and Bert. These methods were tested on a Kaggle dataset. On comparing the 
accuracies of all the above models, we can conclude that LSTM and BERT have out-performed 
all the Machine Learning models. In order to train a large volume of complex data, we propose 
a hybrid model which will help to attain higher accuracy. 
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